Flexible Harmonic/stochastic Modeling for Hmm-based Speech Synthesis
نویسندگان
چکیده
In this paper the preliminary results, of a new approach on speech modeling for statistical parametric HMM-based speech synthesis are presented. The proposed system is based on a flexible pitch-asynchronous harmonic/stochastic model (HSM) [1]. The speech is modeled as the superposition of two components: a harmonic component and a stochastic or aperiodic component. The fact that the synthesis model is pitch-asynchronous allows the direct integration to a HMM-based synthesis system. HTS [2], an open source software toolkit that provides HMM-based speech synthesis was used. The proposed HSM method was compared to the HTS baseline system with the same configurations and database. A number of different experiments were conducted. Results show that high quality of synthesized utterances is reached. A small perceptual test was carried out comparing the two systems on quality of the synthetic voice and similarity to the original voice. HSM outperforms the HTS baseline system in the quality test: HSM 53%, HTS 35,3%, and undecided 11,7%. Concerning similarity to the original voice, HSM-performed slightly better than HTS: HSM 35,3%, HTS 29,4%, and undecided 35,3%.
منابع مشابه
Using FO Contour Generation Process Model for Improved and Flexible Control of Prosodie Features in HMM-based Speech Synthesis
Generation process model of fundamental frequency contours known as Fujisaki's model is ideal to represent global features of prosody. It is a command response model, where the commands have clear relations with linguistic and para/non linguistic information included in the utterance. Therefore, by controlling fundamental frequency contours in the framework of the generation process model, a mo...
متن کاملEstimation of resonant characteristics based on AR-HMM modeling and spectral envelope conversion of vowel sounds
A new method was developed for accurately separating source and articulation filter characteristics of speech. This method is based on the AR-HMM modeling, where the residual waveform is expressed as the output sequence from an HMM. To realize an accurate analysis, a scheme of dividing HMM state was newly introduced. Using the AR-filter parameter values obtained through the analysis, we can con...
متن کاملFlexible harmonic/stochastic speech synthesis
In this paper, our flexible harmonic/stochastic waveform generator for a speech synthesis system is presented. The speech is modeled as the superposition of two components: a harmonic component and a stochastic or aperiodic component. The purpose of this representation is to provide a framework with maximum flexibility for all kind of speech transformations. In contrast to other similar systems...
متن کاملSuperpositional Modeling of Fundamental Frequency Contours for HMM-based Speech Synthesis
Statistical parametric speech synthesis technologies, such as HMM-based and DNN-based ones, gain special attention from researchers because of their ability in generating speech in various voice qualities and styles. In these methods, all acoustic parameters (except durational ones) are handled in a frame-by-frame manner, which is not appropriate for prosodic features. Although relation of adja...
متن کاملA deterministic plus stochastic model of the residual signal for improved parametric speech synthesis
Speech generated by parametric synthesizers generally suffers from a typical buzziness, similar to what was encountered in old LPC-like vocoders. In order to alleviate this problem, a more suited modeling of the excitation should be adopted. For this, we hereby propose an adaptation of the Deterministic plus Stochastic Model (DSM) for the residual. In this model, the excitation is divided into ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008